智能论文笔记

Image denoising in acoustic field microscopy

Shubham Kumar Gupta , Azeem Ahmad , Prakhar Kumar , Frank Melandso , Anowarul Habib

分类：计算机视觉

2022-08-07

由于显微镜图像被广泛用于生物医学或材料研究，因此已使用扫描声显微镜（SAM）。声学成像是无损测试（NDT），生物医学成像和结构健康监测的一种重要且建立的方法。这些成像经常以低振幅的信号进行，这可能导致引导液噪声，缺乏嘈杂的和缺乏的。图像信息的详细信息。在这项工作中，我们试图分析从低振幅信号中获取的SAM图像，并在时间域信号上使用块匹配过滤器以获得DeNocer的图像。我们已经将图像与在时间域信号上应用的常规过滤器进行了比较，例如高斯滤波器，中值滤波器，Wiener滤波器和总变化过滤器。著名的结果在本文中显示。

translated by 谷歌翻译

Wirelessly-Controlled Untethered Piezoelectric Planar Soft Robot Capable of Bidirectional Crawling and Rotation

Zhiwu Zheng , Hsin Cheng , Prakhar Kumar , Sigurd Wagner , Minjie Chen , Naveen Verma , James C. Sturm

分类：机器人

2022-07-01

静电执行器为创建软机器人板提供了一种有希望的方法，因为它们的柔性外形，模块化集成和快速响应速度。但是，它们的控制需要千伏信号，并理解由板上和环境效应的力相互作用引起的复杂动力学。在这项工作中，我们演示了一个不受限制的二维五实机压电机器人，该机器人由电池和板载高压电路提供动力，并通过无线链路进行控制。可扩展的制造方法基于彼此之间的键合化层（钢箔底物，执行器，柔性电子设备）。机器人表现出一系列可控运动，包括双向爬行（高达〜0.6 cm/s），转弯和现场旋转（约1度/s）。高速视频和控制实验表明，运动的丰富性是由于机器人中不对称质量分布的相互作用以及动力学对压电驱动频率的相关依赖性。

translated by 谷歌翻译

Piezoelectric Soft Robot Inchworm Motion by Controlling Ground Friction through Robot Shape

Zhiwu Zheng , Prakhar Kumar , Yenan Chen , Hsin Cheng , Sigurd Wagner , Minjie Chen , Naveen Verma , James C. Sturm

分类：机器人

2021-11-01

电驱动的软机器人能够实现小型和灯体，以及环境兼容性，各种运动和安全操作。特别地，静电致动器（例如，压电致动器）快速响应。但是，可扩展的无缝集成和不可阻止操作的方法仍不清楚。此外，软体自然建模，包括环境互动，是一个长期存在的挑战。此外，需要探索更多的机器机制。在本文中，我们设计了模型，建模并展示了一个软机器人，这是第一次开始解决所有这些问题。它具有平面结构的五个执行器的线性阵列，用于集成和自由操作的开门。通过依靠姿势自我调整，设计和验证了一种新的九寸式捕获的爬行运动机制。通过实验开发并验证了包括井解释机器人运动的压电，重力和地面相互作用的第一分析软体模型。我们展示了机器人的前向和向后运动，并探索了有效载荷和驾驶速度的影响：每循环的1.2 mm运动，在移动时可以携带高达200克的有效载荷（16倍体重）。这项工作为复杂的未知环境中的快速响应机器人铺平了道路。

translated by 谷歌翻译

e-Inu: Simulating A Quadruped Robot With Emotional Sentience

Abhiruph Chakravarty , Jatin Karthik Tripathy , Sibi Chakkaravarthy S , Aswani Kumar Cherukuri , S. Anitha , Firuz Kamalov , Annapurna Jonnalagadda

分类：机器人 | 机器学习

2023-01-03

Quadruped robots are currently used in industrial robotics as mechanical aid to automate several routine tasks. However, presently, the usage of such a robot in a domestic setting is still very much a part of the research. This paper discusses the understanding and virtual simulation of such a robot capable of detecting and understanding human emotions, generating its gait, and responding via sounds and expression on a screen. To this end, we use a combination of reinforcement learning and software engineering concepts to simulate a quadruped robot that can understand emotions, navigate through various terrains and detect sound sources, and respond to emotions using audio-visual feedback. This paper aims to establish the framework of simulating a quadruped robot that is emotionally intelligent and can primarily respond to audio-visual stimuli using motor or audio response. The emotion detection from the speech was not as performant as ERANNs or Zeta Policy learning, still managing an accuracy of 63.5%. The video emotion detection system produced results that are almost at par with the state of the art, with an accuracy of 99.66%. Due to its "on-policy" learning process, the PPO algorithm was extremely rapid to learn, allowing the simulated dog to demonstrate a remarkably seamless gait across the different cadences and variations. This enabled the quadruped robot to respond to generated stimuli, allowing us to conclude that it functions as predicted and satisfies the aim of this work.

translated by 谷歌翻译

NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory

Santhosh Kumar Ramakrishnan , Ziad Al-Halah , Kristen Grauman

分类：计算机视觉

2023-01-02

Searching long egocentric videos with natural language queries (NLQ) has compelling applications in augmented reality and robotics, where a fluid index into everything that a person (agent) has seen before could augment human memory and surface relevant information on demand. However, the structured nature of the learning problem (free-form text query inputs, localized video temporal window outputs) and its needle-in-a-haystack nature makes it both technically challenging and expensive to supervise. We introduce Narrations-as-Queries (NaQ), a data augmentation strategy that transforms standard video-text narrations into training data for a video query localization model. Validating our idea on the Ego4D benchmark, we find it has tremendous impact in practice. NaQ improves multiple top models by substantial margins (even doubling their accuracy), and yields the very best results to date on the Ego4D NLQ challenge, soundly outperforming all challenge winners in the CVPR and ECCV 2022 competitions and topping the current public leaderboard. Beyond achieving the state-of-the-art for NLQ, we also demonstrate unique properties of our approach such as gains on long-tail object queries, and the ability to perform zero-shot and few-shot NLQ.

translated by 谷歌翻译

Statistical Machine Translation for Indic Languages

Sudhansu Bala Das , Divyajoti Panda , Tapas Kumar Mishra , Bidyut Kr. Patra

分类：自然语言处理

2023-01-02

Machine Translation (MT) system generally aims at automatic representation of source language into target language retaining the originality of context using various Natural Language Processing (NLP) techniques. Among various NLP methods, Statistical Machine Translation(SMT). SMT uses probabilistic and statistical techniques to analyze information and conversion. This paper canvasses about the development of bilingual SMT models for translating English to fifteen low-resource Indian Languages (ILs) and vice versa. At the outset, all 15 languages are briefed with a short description related to our experimental need. Further, a detailed analysis of Samanantar and OPUS dataset for model building, along with standard benchmark dataset (Flores-200) for fine-tuning and testing, is done as a part of our experiment. Different preprocessing approaches are proposed in this paper to handle the noise of the dataset. To create the system, MOSES open-source SMT toolkit is explored. Distance reordering is utilized with the aim to understand the rules of grammar and context-dependent adjustments through a phrase reordering categorization framework. In our experiment, the quality of the translation is evaluated using standard metrics such as BLEU, METEOR, and RIBES

translated by 谷歌翻译

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

Benjamin Wilson , William Qi , Tanmay Agarwal , John Lambert , Jagjeet Singh , Siddhesh Khandelwal , Bowen Pan , Ratnesh Kumar , Andrew Hartnett , Jhony Kaesemodel Pontes

分类：计算机视觉 | 人工智能 | 机器学习 | 机器人

2023-01-02

We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.

translated by 谷歌翻译

Mapping smallholder cashew plantations to inform sustainable tree crop expansion in Benin

Leikun Yin , Rahul Ghosh , Chenxi Lin , David Hale , Christoph Weigl , James Obarowski , Junxiong Zhou , Jessica Till , Xiaowei Jia , Troy Mao

分类：计算机视觉 | 机器学习

2023-01-01

Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.

translated by 谷歌翻译

Linear programming word problems formulation using EnsembleCRF NER labeler and T5 text generator with data augmentations

JiangLong He , Mamatha N , Shiv Vignesh , Deepak Kumar , Akshay Uppal

分类：自然语言处理 | 人工智能

2022-12-30

We propose an ensemble approach to predict the labels in linear programming word problems. The entity identification and the meaning representation are two types of tasks to be solved in the NL4Opt competition. We propose the ensembleCRF method to identify the named entities for the first task. We found that single models didn't improve for the given task in our analysis. A set of prediction models predict the entities. The generated results are combined to form a consensus result in the ensembleCRF method. We present an ensemble text generator to produce the representation sentences for the second task. We thought of dividing the problem into multiple small tasks due to the overflow in the output. A single model generates different representations based on the prompt. All the generated text is combined to form an ensemble and produce a mathematical meaning of a linear programming problem.

translated by 谷歌翻译

A Fine-Grained Vehicle Detection (FGVD) Dataset for Unconstrained Roads

Prafful Kumar Khoba , Chirag Parikh , Rohit Saluja , Ravi Kiran Sarvadevabhatla , C. V. Jawahar

分类：计算机视觉

2022-12-30

The previous fine-grained datasets mainly focus on classification and are often captured in a controlled setup, with the camera focusing on the objects. We introduce the first Fine-Grained Vehicle Detection (FGVD) dataset in the wild, captured from a moving camera mounted on a car. It contains 5502 scene images with 210 unique fine-grained labels of multiple vehicle types organized in a three-level hierarchy. While previous classification datasets also include makes for different kinds of cars, the FGVD dataset introduces new class labels for categorizing two-wheelers, autorickshaws, and trucks. The FGVD dataset is challenging as it has vehicles in complex traffic scenarios with intra-class and inter-class variations in types, scale, pose, occlusion, and lighting conditions. The current object detectors like yolov5 and faster RCNN perform poorly on our dataset due to a lack of hierarchical modeling. Along with providing baseline results for existing object detectors on FGVD Dataset, we also present the results of a combination of an existing detector and the recent Hierarchical Residual Network (HRN) classifier for the FGVD task. Finally, we show that FGVD vehicle images are the most challenging to classify among the fine-grained datasets.

translated by 谷歌翻译